What is the impact of socio-economic factors on Criminality in France?

Author

Pierre Monari, Elwin Freudiger

Published

November 28, 2023

1 Introduction

1.1 Overview and motivations

This project will analyze the factors influencing Criminality in France. We chose this Subject because we are both french and criminality consistently ranks as one of the most important priority for the French population. In our study, we wanted to focus on the social and economic factors contributing to this issue. France is often regarded as one of the major world powers, and yet, it is clear that similarly to the rest of the world, the french population is feeling more and more insecure regarding crime. Crime is impacting the political landscape very much. This subject has also made it’s way into everyday discussions as it proves to be a polarizing topic. In the last thirty years, Politicians have often promised to tackle crime through several means, and as the population wants more to be done against it, the government has often made it one of their top priority.

In recent times, there has been a noticeable rise in criminal activities across France. Regions such as “Seine Saint Denis” in the Paris area. has seen emboldened criminals committing crimes in broad daylight, contributing to a perception of increased chaos and increasing the feeling of insecurity. Similar concerns have surfaced in cities like Marseille. Additionally, it’s worth noting that the prison population has reached an all-time high in July 2023. But what are the reasons for increased criminality?

The goal of our research is to synthesize the main reasons behind criminality. Medias have often presented various reasons to explain crime, our research will look at those hints and see if they may have any statistical significance. It should be noted that data regarding “Ethnic origin” is not available, this variable is sometimes singled out as one of the driver of criminality and unlike in the United States, France has outlawed any form of statistics on this subject. as such, we will mostly focus on socio-economics drivers behind crime. France is facing substantial socio-economics issues, unemployment, poverty, conditions of leaving, and school level. Our project explores the link between socio-economic factors and criminality rates. Establishing such a link would give a new outlook on crime and its causes. Additionally we would like to know whether, this data could be used by the government to address public safety. Moreover, what are the core human drivers that push humans to commit crimes? How their socio-economic environment and their educational path may impact their ability to commit crimes?

1.2 Associated papers

https://www.institutpourlajustice.org/content/2017/12/Pauvreté-et-délinquance.pdf

1.3 Research questions

The purpose of this study is to evaluate the impact of different socioeconomic factors on criminality in France. We are using for that a french government database which divide crimes in 14 categories:

  1. Destruction and intentional damages
  2. Thefts without violence against people
  3. Residential burglaries
  4. Assault and battery
  5. Thefts from vehicles
  6. Domestic assault and battery
  7. Theft of vehicle accessories
  8. Other intentional assault and battery
  9. Sexual violence
  10. Violent robberies without weapons
  11. Drug trafficking
  12. Robbery with weapons

We will consider the following macroeconomic variables:

  • Poverty rate,
  • Unemployment rate
  • School indicators(Success rate at an exam/Share of adults without a dimploma)
  • Density of Population
  • Results of previous presidential election.

These are the following question we will focus on:

  • Can the criminality rate in France be explained using the unemployment rate?
  • Can the criminality rate in France be explained using the poverty rate?
  • Can the criminality rate in France be explained using education indicators ?
  • Can the density of population in territories explain criminality?
  • Can the immigration rate explain criminality?
  • Can the previous presidential results in territories have an impact on criminality?
  • Is there a correlation between these mentioned variables?

2 Data Wrangling

In this section,we will look at our raw datasets, what information they give us, how we wrangle/clean them to arrive at our final dataset. Lastly we will look at the mistakes in our data and any anomalies or outliers we may find.

Firstly, we will use a dataset with values by french state. This dataset will not be used for our Regression analysis as only having 95 observations is not enough to have a good statistical analysis.

2.1 Raw datasets

2.1.1 Data by Department

2.1.1.1 Population

There is one excel sheet per year, starting in 1975 and ending in 2023

Estimation de population au 1er janvier, par département, sexe et grande classe d'âge ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13 ...14 ...15 ...16 ...17 ...18 ...19 ...20
Année 2023 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Départements NA Ensemble NA NA NA NA NA Hommes NA NA NA NA NA Femmes NA NA NA NA NA
NA NA 0 à 19 ans 20 à 39 ans 40 à 59 ans 60 à 74 ans 75 ans et plus Total 0 à 19 ans 20 à 39 ans 40 à 59 ans 60 à 74 ans 75 ans et plus Total 0 à 19 ans 20 à 39 ans 40 à 59 ans 60 à 74 ans 75 ans et plus Total
01 Ain 170727 149960 183127 109363 58760 671937 88293 75548 90943 52486 24308 331578 82434 74412 92184 56877 34452 340359
02 Aisne 125684 113454 132390 98737 52526 522791 64688 57214 65872 47242 20699 255715 60996 56240 66518 51495 31827 267076

This dataset gives us information regarding the population inside the dataset, the population is also divided by gender and age groups. We will only use the total number of people per department. It shoudl be noted that what we call our department number is a Character. This is due to the Corsica departments, split in two in 1975 and named 2A and 2B, they used to department number 20.

Source: Population by state

2.1.1.2 Crime

There is one excel sheet per department.

Index libellé index _2022_08 _2022_07 _2022_06 _2022_05 _2022_04 _2022_03 _2022_02 _2022_01 _2021_12 _2021_11 _2021_10 _2021_09 _2021_08 _2021_07 _2021_06 _2021_05 _2021_04 _2021_03 _2021_02 _2021_01 _2020_12 _2020_11 _2020_10 _2020_09 _2020_08 _2020_07 _2020_06 _2020_05 _2020_04 _2020_03 _2020_02 _2020_01 _2019_12 _2019_11 _2019_10 _2019_09 _2019_08 _2019_07 _2019_06 _2019_05 _2019_04 _2019_03 _2019_02 _2019_01 _2018_12 _2018_11 _2018_10 _2018_09 _2018_08 _2018_07 _2018_06 _2018_05 _2018_04 _2018_03 _2018_02 _2018_01 _2017_12 _2017_11 _2017_10 _2017_09 _2017_08 _2017_07 _2017_06 _2017_05 _2017_04 _2017_03 _2017_02 _2017_01 _2016_12 _2016_11 _2016_10 _2016_09 _2016_08 _2016_07 _2016_06 _2016_05 _2016_04 _2016_03 _2016_02 _2016_01 _2015_12 _2015_11 _2015_10 _2015_09 _2015_08 _2015_07 _2015_06 _2015_05 _2015_04 _2015_03 _2015_02 _2015_01 _2014_12 _2014_11 _2014_10 _2014_09 _2014_08 _2014_07 _2014_06 _2014_05 _2014_04 _2014_03 _2014_02 _2014_01 _2013_12 _2013_11 _2013_10 _2013_09 _2013_08 _2013_07 _2013_06 _2013_05 _2013_04 _2013_03 _2013_02 _2013_01 _2012_12 _2012_11 _2012_10 _2012_09 _2012_08 _2012_07 _2012_06 _2012_05 _2012_04 _2012_03 _2012_02 _2012_01 _2011_12 _2011_11 _2011_10 _2011_09 _2011_08 _2011_07 _2011_06 _2011_05 _2011_04 _2011_03 _2011_02 _2011_01 _2010_12 _2010_11 _2010_10 _2010_09 _2010_08 _2010_07 _2010_06 _2010_05 _2010_04 _2010_03 _2010_02 _2010_01 _2009_12 _2009_11 _2009_10 _2009_09 _2009_08 _2009_07 _2009_06 _2009_05 _2009_04 _2009_03 _2009_02 _2009_01 _2008_12 _2008_11 _2008_10 _2008_09 _2008_08 _2008_07 _2008_06 _2008_05 _2008_04 _2008_03 _2008_02 _2008_01 _2007_12 _2007_11 _2007_10 _2007_09 _2007_08 _2007_07 _2007_06 _2007_05 _2007_04 _2007_03 _2007_02 _2007_01 _2006_12 _2006_11 _2006_10 _2006_09 _2006_08 _2006_07 _2006_06 _2006_05 _2006_04 _2006_03 _2006_02 _2006_01 _2005_12 _2005_11 _2005_10 _2005_09 _2005_08 _2005_07 _2005_06 _2005_05 _2005_04 _2005_03 _2005_02 _2005_01 _2004_12 _2004_11 _2004_10 _2004_09 _2004_08 _2004_07 _2004_06 _2004_05 _2004_04 _2004_03 _2004_02 _2004_01 _2003_12 _2003_11 _2003_10 _2003_09 _2003_08 _2003_07 _2003_06 _2003_05 _2003_04 _2003_03 _2003_02 _2003_01 _2002_12 _2002_11 _2002_10 _2002_09 _2002_08 _2002_07 _2002_06 _2002_05 _2002_04 _2002_03 _2002_02 _2002_01 _2001_12 _2001_11 _2001_10 _2001_09 _2001_08 _2001_07 _2001_06 _2001_05 _2001_04 _2001_03 _2001_02 _2001_01 _2000_12 _2000_11 _2000_10 _2000_09 _2000_08 _2000_07 _2000_06 _2000_05 _2000_04 _2000_03 _2000_02 _2000_01 _1999_12 _1999_11 _1999_10 _1999_09 _1999_08 _1999_07 _1999_06 _1999_05 _1999_04 _1999_03 _1999_02 _1999_01 _1998_12 _1998_11 _1998_10 _1998_09 _1998_08 _1998_07 _1998_06 _1998_05 _1998_04 _1998_03 _1998_02 _1998_01 _1997_12 _1997_11 _1997_10 _1997_09 _1997_08 _1997_07 _1997_06 _1997_05 _1997_04 _1997_03 _1997_02 _1997_01 _1996_12 _1996_11 _1996_10 _1996_09 _1996_08 _1996_07 _1996_06 _1996_05 _1996_04 _1996_03 _1996_02 _1996_01
1 Règlements de compte entre malfaiteurs 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 Homicides pour voler et à l'occasion de vols 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
3 Homicides pour d'autres motifs 0 3 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 2 0 0 0 1 0 0 0 0 1 0 4 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 2 1 1 0 0 1 0 2 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 2 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 2 0 0 0 0 0 1 0 0 1 0 0 2 1 0 0 0 1 0 2 0 0 0 2 0 2 0 0 0 0 0 2 0 0 0 1 1 0 1 0 0 0 1 0 0 0 6 0 2 0 2 0 0 3 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 2 1 0 0 1 0 0 4 0 0 2 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 1 2 0 1 0 0 2 1 0 0 1 2 1 0 0 1 0 0 2 0 0 1 2 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 0 0 0 0 0 2 0 0 1 1 0 0 1 0 3 0 1 1 1 0 1 0 3 0 0 0
4 Tentatives d'homicides pour voler et à l'occasion de vols 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0
5 Tentatives homicides pour d'autres motifs 0 4 0 1 1 0 1 0 2 0 0 0 0 3 1 0 0 0 1 0 0 1 1 0 3 0 1 0 3 0 0 0 2 0 0 0 0 1 0 0 0 1 0 2 0 1 0 2 0 1 0 1 3 0 1 1 0 0 0 1 1 0 2 5 2 0 1 0 1 0 1 1 0 3 1 3 4 0 0 0 0 0 1 1 0 1 0 0 2 0 2 0 0 0 0 1 0 1 0 2 0 0 0 1 0 0 2 0 0 1 0 1 0 0 0 0 0 0 2 0 0 0 0 1 0 0 1 0 0 0 2 0 0 0 0 0 2 0 0 0 0 1 2 0 1 0 1 1 0 0 3 0 1 0 1 0 1 0 0 0 0 0 1 1 7 0 0 5 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 3 1 0 1 4 0 0 0 0 0 3 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 2 1 0 0 1 1 2 0 0 1 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 1 4 0 1 1 0 1 0 1 1 0 0 1 0 0 0 2 2 1 0 0 0 0 0 0 0 1 0 0 0 1 0 2 1 0 3 1 1 0 1 0 0 1 1 2 2 0 1 1 0 1 0 1 0 0 1 1 0 0 2 1 0 0 0 1 0 0 1 0 1 0
6 Coups et blessures volontaires suivis de mort 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 3 0 1 0 1 0 0

This dataset is our main one, it tells us for example, that in the department “01” in August 2022, there was 1 case of counterfeit currency. we will be able to calculate the total number of crimes happening each month by department.

Source: Crime by state

2.1.1.3 Unemployement

Libellé idBank Dernière.mise.à.jour Période X1982.T1 X1982.T2 X1982.T3 X1982.T4 X1983.T1 X1983.T2 X1983.T3 X1983.T4 X1984.T1 X1984.T2 X1984.T3 X1984.T4 X1985.T1 X1985.T2 X1985.T3 X1985.T4 X1986.T1 X1986.T2 X1986.T3 X1986.T4 X1987.T1 X1987.T2 X1987.T3 X1987.T4 X1988.T1 X1988.T2 X1988.T3 X1988.T4 X1989.T1 X1989.T2 X1989.T3 X1989.T4 X1990.T1 X1990.T2 X1990.T3 X1990.T4 X1991.T1 X1991.T2 X1991.T3 X1991.T4 X1992.T1 X1992.T2 X1992.T3 X1992.T4 X1993.T1 X1993.T2 X1993.T3 X1993.T4 X1994.T1 X1994.T2 X1994.T3 X1994.T4 X1995.T1 X1995.T2 X1995.T3 X1995.T4 X1996.T1 X1996.T2 X1996.T3 X1996.T4 X1997.T1 X1997.T2 X1997.T3 X1997.T4 X1998.T1 X1998.T2 X1998.T3 X1998.T4 X1999.T1 X1999.T2 X1999.T3 X1999.T4 X2000.T1 X2000.T2 X2000.T3 X2000.T4 X2001.T1 X2001.T2 X2001.T3 X2001.T4 X2002.T1 X2002.T2 X2002.T3 X2002.T4 X2003.T1 X2003.T2 X2003.T3 X2003.T4 X2004.T1 X2004.T2 X2004.T3 X2004.T4 X2005.T1 X2005.T2 X2005.T3 X2005.T4 X2006.T1 X2006.T2 X2006.T3 X2006.T4 X2007.T1 X2007.T2 X2007.T3 X2007.T4 X2008.T1 X2008.T2 X2008.T3 X2008.T4 X2009.T1 X2009.T2 X2009.T3 X2009.T4 X2010.T1 X2010.T2 X2010.T3 X2010.T4 X2011.T1 X2011.T2 X2011.T3 X2011.T4 X2012.T1 X2012.T2 X2012.T3 X2012.T4 X2013.T1 X2013.T2 X2013.T3 X2013.T4 X2014.T1 X2014.T2 X2014.T3 X2014.T4 X2015.T1 X2015.T2 X2015.T3 X2015.T4 X2016.T1 X2016.T2 X2016.T3 X2016.T4 X2017.T1 X2017.T2 X2017.T3 X2017.T4 X2018.T1 X2018.T2 X2018.T3 X2018.T4 X2019.T1 X2019.T2 X2019.T3 X2019.T4 X2020.T1 X2020.T2 X2020.T3 X2020.T4 X2021.T1 X2021.T2 X2021.T3 X2021.T4 X2022.T1 X2022.T2 X2022.T3 X2022.T4 X2023.T1 X2023.T2
Taux de chômage localisé par région - France métropolitaine 1515842 29/09/2023 12:00 NA 6.7 6.8 7.0 7.0 7.0 7.0 7.2 7.5 7.9 8.3 8.5 8.7 8.8 8.8 8.8 8.8 8.7 8.8 8.8 8.9 9.0 9.0 8.8 8.8 8.7 8.6 8.5 8.3 8.1 7.9 7.8 7.8 7.7 7.7 7.6 7.6 7.6 7.7 8.0 8.2 8.4 8.6 8.8 9.0 9.2 9.5 9.8 10.1 10.3 10.4 10.3 10.1 9.8 9.6 9.5 9.6 9.9 10.1 10.2 10.2 10.3 10.3 10.3 10.1 9.9 9.8 9.7 9.8 9.8 9.8 9.5 9.1 8.7 8.3 8.0 7.7 7.4 7.3 7.3 7.4 7.5 7.5 7.5 7.6 8.0 8.1 8.0 8.4 8.6 8.4 8.5 8.5 8.3 8.5 8.7 8.7 8.8 8.6 8.6 8.0 8.2 7.8 7.7 7.1 6.9 7.0 7.1 7.4 8.2 8.9 8.9 9.1 9.0 8.9 8.9 8.8 8.8 8.7 8.9 9.0 9.1 9.4 9.4 9.8 10.0 10.1 9.9 9.8 9.8 9.8 9.9 10.1 10.0 10.2 10.0 9.9 9.9 9.7 9.6 9.8 9.3 9.2 9.2 8.7 9.0 8.8 8.6 8.4 8.5 8.1 8.1 7.9 7.7 7.0 8.8 7.9 8.0 7.7 7.7 7.2 7.1 7.1 7.1 7.0 6.9 6.9
Codes NA NA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A SD
Taux de chômage localisé par région - Île-de-France 1515843 29/09/2023 12:00 NA 5.6 5.7 5.7 5.7 5.6 5.6 5.8 6.0 6.2 6.5 6.7 6.8 6.9 7.0 7.2 7.3 7.3 7.3 7.3 7.4 7.5 7.5 7.5 7.4 7.4 7.3 7.3 7.1 6.9 6.7 6.6 6.5 6.5 6.5 6.4 6.4 6.4 6.6 6.9 7.0 7.1 7.3 7.6 7.9 8.2 8.6 8.9 9.2 9.3 9.3 9.2 9.1 8.9 8.8 8.7 8.8 9.0 9.2 9.2 9.2 9.3 9.4 9.4 9.2 9.1 9.0 8.9 8.9 8.9 8.8 8.5 8.2 7.8 7.4 7.1 6.8 6.5 6.4 6.5 6.6 6.8 7.1 7.3 7.4 7.9 8.1 8.1 8.4 8.7 8.5 8.6 8.6 8.3 8.4 8.5 8.4 8.4 8.2 8.1 7.5 7.7 7.4 7.2 6.6 6.3 6.4 6.4 6.5 7.1 7.6 7.8 8.1 8.0 7.9 7.9 7.9 7.8 7.8 7.9 8.0 8.1 8.3 8.3 8.5 8.7 8.8 8.7 8.7 8.7 8.7 8.8 9.0 8.8 9.0 8.9 8.8 8.8 8.6 8.5 8.7 8.3 8.2 8.3 7.8 8.1 8.0 7.8 7.7 7.6 7.4 7.4 7.2 7.0 6.3 8.3 7.7 7.8 7.5 7.6 7.1 7.0 7.0 6.9 6.8 6.6 6.7
Codes NA NA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A SD
Taux de chômage localisé par région - Centre-Val de Loire 1515847 29/09/2023 12:00 NA 5.7 5.7 5.9 6.0 6.0 6.0 6.3 6.7 7.1 7.4 7.7 8.0 8.0 8.0 8.1 8.0 8.1 8.2 8.1 8.3 8.4 8.4 8.2 8.0 7.9 7.7 7.6 7.4 7.2 7.0 6.8 6.8 6.6 6.4 6.4 6.4 6.4 6.5 6.8 7.0 7.3 7.5 7.6 7.8 8.0 8.4 8.7 9.0 9.2 9.2 9.2 8.9 8.6 8.3 8.1 8.3 8.6 8.8 9.0 9.1 9.2 9.0 8.9 8.7 8.5 8.4 8.3 8.3 8.3 8.2 7.9 7.5 7.1 6.7 6.5 6.2 6.0 5.9 5.9 6.1 6.1 6.2 6.3 6.4 6.7 7.0 7.0 7.4 7.6 7.4 7.5 7.6 7.4 7.5 7.7 7.7 7.8 7.6 7.6 7.1 7.1 6.7 6.7 6.2 5.9 6.1 6.3 6.6 7.5 8.1 8.1 8.4 8.4 8.3 8.3 8.2 8.1 8.0 8.3 8.4 8.5 8.8 9.0 9.3 9.5 9.6 9.5 9.4 9.4 9.5 9.6 9.8 9.6 9.8 9.7 9.6 9.7 9.5 9.4 9.5 9.1 9.0 8.9 8.4 8.7 8.5 8.5 8.3 8.4 8.0 8.0 7.8 7.4 6.8 8.2 7.2 7.4 7.2 7.2 6.7 6.8 6.9 6.8 6.7 6.6 6.7
Codes NA NA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A SD

Source: Unemployement by state

2.1.1.4 Middle school final exam

Session Numero.d.etablissement Type.d.etablissement Patronyme Secteur.d.enseignement Commune Libellé.commune Code.département Libellé.département Code.académie Libellé.académie Code.région Libellé.région Inscrits Presents Admis Admis.sans.mention Nombre_d_admis_Mention_AB Admis.Mention.bien Admis.Mention.très.bien Taux.de.réussite
2008 0381810L COLLEGE GERARD PHILIPE PUBLIC 38169 FONTAINE 038 ISERE 8 GRENOBLE 1 AUVERGNE-RHONE-ALPES 97 94 57 20 20 16 1 60,60%
2008 0381818V LYCEE PROFESSIONNEL - PUBLIC 38563 VOIRON 038 ISERE 8 GRENOBLE 1 AUVERGNE-RHONE-ALPES 19 19 19 13 5 1 0 100,00%
2008 0381888W LYCEE PROFESSIONNEL - PUBLIC 38509 LA TOUR-DU-PIN 038 ISERE 8 GRENOBLE 1 AUVERGNE-RHONE-ALPES 24 24 22 11 8 3 0 91,60%
2008 0382099A LYCEE ROGER DESCHAUX PUBLIC 38474 SASSENAGE 038 ISERE 8 GRENOBLE 1 AUVERGNE-RHONE-ALPES 24 22 11 4 7 0 0 50,00%
2008 0382100B COLLEGE GRESIVAUDAN PUBLIC 38397 SAINT-ISMIER 038 ISERE 8 GRENOBLE 1 AUVERGNE-RHONE-ALPES 193 190 174 38 57 45 34 91,50%
2008 0382104F COLLEGE LOUIS ARAGON PUBLIC 38553 VILLEFONTAINE 038 ISERE 8 GRENOBLE 1 AUVERGNE-RHONE-ALPES 90 86 53 33 9 8 3 61,60%

At the end of middle school in France, sutdents must complete an exam and if they succed, the students will receive their “brevet des collèges” and can continue to high school. This would be similar to the “Certificat” in Switzerland. Here this database gives us results for each middle school in France. The first column is exam year. Column 2 to 13 tell us where and what type of school it was. After column 13, we find how many students registered, how many came and how many students obtained the “Brevet”. Lastly, we get more information regarding how well the students did. A “mention” can be obtained if the students acheive a certain average grade. Lastly, the success rate is calculated by dividing the number of students who obtained the diploma by the number of students who came the day of the exam. We will not use this sucess rate as we believe that computing the success rate by diving students who obtained the degree with registered students would give us more information regarding education levels in France.

Source: Middle school exam results by state

2.1.1.5 Election Results

Code du département Libellé du département Inscrits Abstentions % Abs/Ins Votants % Vot/Ins Blancs % Blancs/Ins % Blancs/Vot Nuls % Nuls/Ins % Nuls/Vot Exprimés % Exp/Ins % Exp/Vot Sexe...17 Nom...18 Prénom...19 Voix...20 % Voix/Ins...21 % Voix/Exp...22 Sexe...23 Nom...24 Prénom...25 Voix...26 % Voix/Ins...27 % Voix/Exp...28
1 Ain 415950 93130 22.39 322820 77.61 28852 6.94 8.94 8738 2.10 2.71 285230 68.57 88.36 M MACRON Emmanuel 173809 41.79 60.94 F LE PEN Marine 111421 26.79 39.06
2 Aisne 375791 90745 24.15 285046 75.85 22838 6.08 8.01 9067 2.41 3.18 253141 67.36 88.81 M MACRON Emmanuel 119202 31.72 47.09 F LE PEN Marine 133939 35.64 52.91
3 Allier 253479 59294 23.39 194185 76.61 18877 7.45 9.72 8522 3.36 4.39 166786 65.80 85.89 M MACRON Emmanuel 106579 42.05 63.90 F LE PEN Marine 60207 23.75 36.10
4 Alpes-de-Haute-Provence 126459 29255 23.13 97204 76.87 9671 7.65 9.95 3722 2.94 3.83 83811 66.28 86.22 M MACRON Emmanuel 48994 38.74 58.46 F LE PEN Marine 34817 27.53 41.54
5 Hautes-Alpes 109892 24895 22.65 84997 77.35 8670 7.89 10.20 2699 2.46 3.18 73628 67.00 86.62 M MACRON Emmanuel 47211 42.96 64.12 F LE PEN Marine 26417 24.04 35.88
6 Alpes-Maritimes 761780 198631 26.07 563149 73.93 47784 6.27 8.49 12414 1.63 2.20 502951 66.02 89.31 M MACRON Emmanuel 278407 36.55 55.35 F LE PEN Marine 224544 29.48 44.65

We take this sheet for the second round of election results in 2017. This second round opposed Emmanuel Macron to Marine Lepen. In this, dataset, we have the number of citizen with voting rights, how many voted, blank ballots, invalid ballots and ballots for each candidates. We also have some basic information on the candidate, such as name and gender. Lastly we also have various ratios, with one being of great interest, the percentage of votes that Marine Lepen received. We decided to take the 2017 election because with our year being 2019, seeing the “impact” of the last election on crime could yield interesting results.

Source: 2017 Election results by state

2.1.1.6 Immigration

...1 Immigrés Non immigrés Ensemble...4 Etrangers Français Ensemble...7
Ain 78968 573464 652432 592775 59657 652432
Aisne 25414 505931 531345 512545 18800 531345
Allier 17423 318552 335975 324064 11911 335975
Alpes-de-Haute-Provence 13309 150999 164308 155227 9081 164308
Hautes-Alpes 8192 133028 141220 135854 5366 141220
Alpes-Maritimes 162232 932051 1094283 972043 122240 1094283

Here we take this sheet and look at the number of immigrants by state. According to the documentation, an immigrant is a person born outside of France and living in France, while a foreigner is someone who does not have the french nationality. It should be noted that the label for foreigners and French citizens appears to have been switched in the dataframe. Fortunately, we will only use the number of immigrants for our study.

Source: Immigration numbers by departement

2.1.1.7 Population Density

Département ...2 Densité
01 Ain 113
02 Aisne 72
03 Allier 46
04 Alpes-de-Haute-Provence 24
05 Hautes-Alpes 25
06 Alpes-Maritimes 255

Source: Population Density by state

2.1.2 Data by town

2.1.2.1 Crime

CODGEO_2023 annee classe unité.de.compte valeur.publiée faits tauxpourmille complementinfoval complementinfotaux POP millPOP LOG millLOG
01001 16 Autres coups et blessures volontaires victime ndiff NA NA 1,4139534883720930 0,8994774745986377 767 16 3,48000000000000e+02 16
01001 17 Autres coups et blessures volontaires victime ndiff NA NA 1,4744186046511627 0,9308908948886161 776 17 3,53730434782610e+02 17
01001 18 Autres coups et blessures volontaires victime ndiff NA NA 1,6744186046511629 1,0494250025506857 771 18 3,53730434782608e+02 18
01001 19 Autres coups et blessures volontaires victime diff 0 0,00000000000000e+00 NA NA 779 19 3,59574601708852e+02 19
01001 20 Autres coups et blessures volontaires victime ndiff NA NA 1,4870689655172413 0,9798018812196120 806 20 3,66185064822065e+02 20
01001 21 Autres coups et blessures volontaires victime ndiff NA NA 1,7098214285714286 1,1468851437948424 806 20 3,66185064822065e+02 20
01001 22 Autres coups et blessures volontaires victime ndiff NA NA 1,8088888888888890 1,2238135473453471 806 20 3,66185064822065e+02 20
01004 16 Autres coups et blessures volontaires victime diff 39 2,76968965272353e+00 NA NA 14081 16 7,12611602800044e+03 16
01004 17 Autres coups et blessures volontaires victime diff 38 2,70751692198076e+00 NA NA 14035 17 7,21711239192120e+03 17
01004 18 Autres coups et blessures volontaires victime diff 46 3,23852435933540e+00 NA NA 14204 18 7,34158489570647e+03 18

This dataset is quite big with almost 3 million rows. The first column is the CODGEO, this is a towncode by the INSEE(French institute of statistics and economics studies) the first 2 numbers are the department that the town is located in and the last 3 numbers are the town number. This first column will be very useful because it is a unique number for each town and we will use it as our key. The second column is our year, the third is the type of crime commited. The fourth column “valeur.publiée” tells us if the town publishes their crime statistics or not. It should be noted that some town are coded as publishing although the value is missing. We will dive into this later in the EDA part of our report. We then have a column telling us the total number of crimes and one for crime rate for a thousand people. The remaining columns will not be used in our analysis. The remaining LOG variables are for households in the town and the mill is a “millésime” method of census that allows a census to be conducted every year at a smaller scale, this is useful to track changes in the population year by year, but it will not be used by us.

Lastly, the rules for publishing data are the following: crime data is only published if the town records more than 5 instances of crime for 3 consecutive years. This is done to protect the privacy of the people involved and because the statistical analysis of such small data would not yield satisfying results.

Source: Crime by town

2.1.2.2 Density

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27
1 01 ozan OZAN ozan Ozan O250 OSN 01190 284 01284 2 26 6 618 469 500 93 6.60 4.91667 46.3833 2866 51546 +45456 462330 170 205
2 01 cormoranche-sur-saone CORMORANCHE-SUR-SAONE cormoranche sur saone Cormoranche-sur-Saône C65652625 KRMRNXSRSN 01290 123 01123 2 27 6 1058 903 1000 107 9.85 4.83333 46.2333 2772 51379 +44953 461427 168 211
3 01 plagne-01 PLAGNE plagne Plagne P425 PLKN 01130 298 01298 4 03 6 129 83 100 20 6.20 5.73333 46.1833 3769 51324 +54342 461131 560 922
4 01 tossiat TOSSIAT tossiat Tossiat T230 TST 01250 422 01422 2 25 6 1406 1111 1400 138 10.17 5.31667 46.1333 3309 51268 +51854 460828 244 501
5 01 pouillat POUILLAT pouillat Pouillat P430 PLT 01250 309 01309 2 33 6 88 58 100 14 6.23 5.43333 46.3333 3435 51475 +52542 461938 333 770
6 01 torcieu TORCIEU torcieu Torcieu T620 TRS 01230 421 01421 1 28 6 698 643 700 65 10.72 5.40000 45.9167 3398 51025 +52343 455521 257 782

This dataframe contains various information regarding french towns. Here, can be found the size of each town, the population but also the altitude for example. Here we will use the population in 2010 and the size to compute the density.

Source: Density by town

2.1.2.3 Population by town

NIVGEO CODGEO LIBGEO SEXE AGEPYR10 NB
COM 01001 L'Abergement-Clémenciat 1 00 14.812696
COM 01001 L'Abergement-Clémenciat 2 00 13.905300
COM 01001 L'Abergement-Clémenciat 1 03 10.848149
COM 01001 L'Abergement-Clémenciat 2 03 6.856037
COM 01001 L'Abergement-Clémenciat 1 06 23.655840
COM 01001 L'Abergement-Clémenciat 2 06 32.462397

This dataset is quite simple, we have a column indicating the level, here is is towns, the CODGEO mentioned above, the town name, followed by the gender and age group. finally we have the number of inhabitants that fit these criterias.

Source: Density by town

2.1.2.4 Population by diploma

Région en géographie courante Département en géographie courante Commune en géographie courante Indicateur de stabilité de la commune Département en géographie 2021 Libellé de commune Aucun diplôme Hommes 16 à 24 ans RP2019 Aucun diplôme Hommes 25 ans ou plus RP2019 Aucun diplôme Femmes 16 à 24 ans RP2019 Aucun diplôme Femmes 25 ans ou plus RP2019 Diplôme de niveau CEP Hommes 16 à 24 ans RP2019 Diplôme de niveau CEP Hommes 25 ans ou plus RP2019 Diplôme de niveau CEP Femmes 16 à 24 ans RP2019 Diplôme de niveau CEP Femmes 25 ans ou plus RP2019 Diplôme de niveau BEPC Hommes 16 à 24 ans RP2019 Diplôme de niveau BEPC Hommes 25 ans ou plus RP2019 Diplôme de niveau BEPC Femmes 16 à 24 ans RP2019 Diplôme de niveau BEPC Femmes 25 ans ou plus RP2019 Diplôme de niveau CAP-BEP Hommes 16 à 24 ans RP2019 Diplôme de niveau CAP-BEP Hommes 25 ans ou plus RP2019 Diplôme de niveau CAP-BEP Femmes 16 à 24 ans RP2019 Diplôme de niveau CAP-BEP Femmes 25 ans ou plus RP2019 Diplôme de niveau bac général ou technique Hommes 16 à 24 ans RP2019 Diplôme de niveau bac général ou technique Hommes 25 ans ou plus RP2019 Diplôme de niveau bac général ou technique Femmes 16 à 24 ans RP2019 Diplôme de niveau bac général ou technique Femmes 25 ans ou plus RP2019 Diplôme universitaire de 1er cycle Hommes 16 à 24 ans RP2019 Diplôme universitaire de 1er cycle Hommes 25 ans ou plus RP2019 Diplôme universitaire de 1er cycle Femmes 16 à 24 ans RP2019 Diplôme universitaire de 1er cycle Femmes 25 ans ou plus RP2019 Diplôme universitaire de 2ème ou 3ème cycle Hommes 16 à 24 ans RP2019 Diplôme universitaire de 2ème ou 3ème cycle Hommes 25 ans ou plus RP2019 Diplôme universitaire de 2ème ou 3ème cycle Femmes 16 à 24 ans RP2019 Diplôme universitaire de 2ème ou 3ème cycle Femmes 25 ans ou plus RP2019
RR DR CR STABLE DR21 LIBELLE dpx_rec0s1age2_rec1rpop2019 dpx_rec0s1age2_rec2rpop2019 dpx_rec0s2age2_rec1rpop2019 dpx_rec0s2age2_rec2rpop2019 dpx_rec1s1age2_rec1rpop2019 dpx_rec1s1age2_rec2rpop2019 dpx_rec1s2age2_rec1rpop2019 dpx_rec1s2age2_rec2rpop2019 dpx_rec2s1age2_rec1rpop2019 dpx_rec2s1age2_rec2rpop2019 dpx_rec2s2age2_rec1rpop2019 dpx_rec2s2age2_rec2rpop2019 dpx_rec3s1age2_rec1rpop2019 dpx_rec3s1age2_rec2rpop2019 dpx_rec3s2age2_rec1rpop2019 dpx_rec3s2age2_rec2rpop2019 dpx_rec4s1age2_rec1rpop2019 dpx_rec4s1age2_rec2rpop2019 dpx_rec4s2age2_rec1rpop2019 dpx_rec4s2age2_rec2rpop2019 dpx_rec5s1age2_rec1rpop2019 dpx_rec5s1age2_rec2rpop2019 dpx_rec5s2age2_rec1rpop2019 dpx_rec5s2age2_rec2rpop2019 dpx_rec6s1age2_rec1rpop2019 dpx_rec6s1age2_rec2rpop2019 dpx_rec6s2age2_rec1rpop2019 dpx_rec6s2age2_rec2rpop2019
84 01 001 1 01 Abergement-Clémenciat 0 31.273727679978059 0 26.319004906244121 0 36.504516945208763 0 42.168740366590413 4.8204307590319528 25.195300294679452 0 25.897214253562552 0 72.042804073821756 10.047478793216261 66.528046044671314 9.9094455474678735 51.11132775508888 4.9547227737339368 50.935251110259628 10.185512038964646 25.624888993112183 0 35.403682355195734 0 45.715893991511749 0 30.43378047534657
84 01 002 1 01 Abergement-De-Varey 0 0 0 5.5320444734237153 0 5.1374280962262988 0 10.402372543219041 0 5.2011862716095205 0 0 0 31.540359468699528 0 10.466921990976699 0 31.273259442268557 0 26.136622618416695 0 15.73425007519765 0 10.596821978971352 0 10.532272531213694 0 21.129094510185048
84 01 003 0 01 Amareins NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
84 01 004 1 01 Ambérieu-En-Bugey 76.83122693441176 643.20453199840733 35.464379173014933 860.73512425945796 0 209.96947893832353 0 432.8132259662828 42.914918277527164 207.23820326380567 52.610753884473795 355.55757783628127 181.41495713237916 1382.3332572324048 122.86736219321433 1245.913072797086 156.88926046556423 822.49605569888286 188.34449242818874 832.16467035277583 50.77482939272798 638.97964337399492 50.046583285767468 536.94746995634785 39.280654212696781 593.41118637421562 47.035995912555691 837.01126571351983
84 01 005 1 01 Ambérieux-En-Dombes 0 83.521324354657963 0 44.217171717171972 0 19.652076318743237 0 63.869248035915504 9.8260381593715458 39.304152637485977 0 49.130190796858017 4.913019079685748 240.73793490460261 4.913019079685748 167.04264870931641 4.9130190796857969 88.434343434343873 9.8260381593715458 152.30359147025854 0 54.043209876543123 4.9130190796856503 93.347362514029356 0 93.347362514029072 4.913019079685748 83.521324354657537

In this dataset, column 1 to 6 are the location of the town, with indicators such as the region, the departement and the town number. Then the population is divided by gender and by diploma obtained, we will only focus on the total population over 25 who doesn’t have any diploma. The whole excel files gives us this information for every year starting in 1968. We only take the year 2019.

Source: Population by diploma by town

2.1.2.5 2017 Election results

Code du département Libellé du département Code de la commune Libellé de la commune Inscrits Abstentions % Abs/Ins Votants % Vot/Ins Blancs % Blancs/Ins % Blancs/Vot Nuls % Nuls/Ins % Nuls/Vot Exprimés % Exp/Ins % Exp/Vot N°Panneau...19 Sexe...20 Nom...21 Prénom...22 Voix...23 % Voix/Ins...24 % Voix/Exp...25 N°Panneau...26 Sexe...27 Nom...28 Prénom...29 Voix...30 % Voix/Ins...31 % Voix/Exp...32
1 Ain 1 L'Abergement-Clémenciat 598 100 16.72 498 83.28 37 6.19 7.43 8 1.34 1.61 453 75.75 90.96 1 M MACRON Emmanuel 272 45.48 60.04 2 F LE PEN Marine 181 30.27 39.96
1 Ain 2 L'Abergement-de-Varey 209 32 15.31 177 84.69 21 10.05 11.86 6 2.87 3.39 150 71.77 84.75 1 M MACRON Emmanuel 93 44.50 62.00 2 F LE PEN Marine 57 27.27 38.00
1 Ain 4 Ambérieu-en-Bugey 8586 2312 26.93 6274 73.07 601 7.00 9.58 217 2.53 3.46 5456 63.55 86.96 1 M MACRON Emmanuel 3213 37.42 58.89 2 F LE PEN Marine 2243 26.12 41.11
1 Ain 5 Ambérieux-en-Dombes 1172 259 22.10 913 77.90 77 6.57 8.43 24 2.05 2.63 812 69.28 88.94 1 M MACRON Emmanuel 423 36.09 52.09 2 F LE PEN Marine 389 33.19 47.91
1 Ain 6 Ambléon 99 23 23.23 76 76.77 17 17.17 22.37 0 0.00 0.00 59 59.60 77.63 1 M MACRON Emmanuel 30 30.30 50.85 2 F LE PEN Marine 29 29.29 49.15
1 Ain 7 Ambronay 1880 336 17.87 1544 82.13 138 7.34 8.94 31 1.65 2.01 1375 73.14 89.05 1 M MACRON Emmanuel 774 41.17 56.29 2 F LE PEN Marine 601 31.97 43.71

This dataset is very similar to the one for departments.

Source: 2017 Election results by town

2.1.2.6 Poverty by town

CODGEO TP4019 TP5019 TP6019 TP60Q219 TP60IP19 AGE1TP4019 AGE1TP5019 AGE1TP6019 AGE1TP60Q219 AGE1TP60IP19 AGE2TP4019 AGE2TP5019 AGE2TP6019 AGE2TP60Q219 AGE2TP60IP19 AGE3TP4019 AGE3TP5019 AGE3TP6019 AGE3TP60Q219 AGE3TP60IP19 AGE4TP4019 AGE4TP5019 AGE4TP6019 AGE4TP60Q219 AGE4TP60IP19 AGE5TP4019 AGE5TP5019 AGE5TP6019 AGE5TP60Q219 AGE5TP60IP19 AGE6TP4019 AGE6TP5019 AGE6TP6019 AGE6TP60Q219 AGE6TP60IP19 TME1TP4019 TME1TP5019 TME1TP6019 TME1TP60Q219 TME1TP60IP19 TME2TP4019 TME2TP5019 TME2TP6019 TME2TP60Q219 TME2TP60IP19 TME3TP4019 TME3TP5019 TME3TP6019 TME3TP60Q219 TME3TP60IP19 TME4TP4019 TME4TP5019 TME4TP6019 TME4TP60Q219 TME4TP60IP19 TME5TP4019 TME5TP5019 TME5TP6019 TME5TP60Q219 TME5TP60IP19 TOL1TP4019 TOL1TP5019 TOL1TP6019 TOL1TP60Q219 TOL1TP60IP19 TOL2TP4019 TOL2TP5019 TOL2TP6019 TOL2TP60Q219 TOL2TP60IP19 TLD2TP4019 TLD2TP5019 TLD2TP6019 TLD2TP60Q219 TLD2TP60IP19 TLD3TP4019 TLD3TP5019 TLD3TP6019 TLD3TP60Q219 TLD3TP60IP19 TYM1TP4019 TYM1TP5019 TYM1TP6019 TYM1TP60Q219 TYM1TP60IP19 TYM2TP4019 TYM2TP5019 TYM2TP6019 TYM2TP60Q219 TYM2TP60IP19 TYM3TP4019 TYM3TP5019 TYM3TP6019 TYM3TP60Q219 TYM3TP60IP19 TYM4TP4019 TYM4TP5019 TYM4TP6019 TYM4TP60Q219 TYM4TP60IP19 TYM5TP4019 TYM5TP5019 TYM5TP6019 TYM5TP60Q219 TYM5TP60IP19 TYM6TP4019 TYM6TP5019 TYM6TP6019 TYM6TP60Q219 TYM6TP60IP19 OPR1TP4019 OPR1TP5019 OPR1TP6019 OPR1TP60Q219 OPR1TP60IP19 OPR2TP4019 OPR2TP5019 OPR2TP6019 OPR2TP60Q219 OPR2TP60IP19 OPR3TP4019 OPR3TP5019 OPR3TP6019 OPR3TP60Q219 OPR3TP60IP19 OPR4TP4019 OPR4TP5019 OPR4TP6019 OPR4TP60Q219 OPR4TP60IP19 OPR5TP4019 OPR5TP5019 OPR5TP6019 OPR5TP60Q219 OPR5TP60IP19
01004 16 22 28 7980 0.385 17 23 30 7950 0.388 23 31 37 7500 0.423 19 28 35 8060 0.379 14 19 23 7180 0.447 NA 10 15 9330 0.281 NA NA NA NA NA 12 17 22 7750 0.404 9 12 16 7930 0.389 13 20 26 8430 0.351 14 21 27 8420 0.352 34 45 53 7560 0.418 5 6 9 9460 0.271 26 35 43 7520 0.421 37 49 58 7170 0.448 14 19 28 8580 0.34 NA NA 20 7030 0.459 NA 18 24 8430 0.351 NA NA 9 9160 0.295 17 24 31 8280 0.362 31 40 49 6920 0.467 NA NA NA NA NA 12 18 25 8960 0.310 67 NA NA 5980 0.539 NA NA NA NA NA 8 12 17 8760 0.326 NA NA NA NA NA
01007 NA 9 13 9400 0.276 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 38 8990 0.308 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 12 10020 0.229 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
01014 8 13 19 9390 0.277 NA NA NA NA NA NA NA NA NA NA NA NA 25 9690 0.254 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 34 9430 0.274 NA NA 10 9790 0.246 17 27 37 8920 0.313 NA 30 41 9050 0.303 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 13 20 9480 0.270 NA NA NA NA NA NA NA NA NA NA NA 10 17 10010 0.229 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
01024 NA 8 12 9430 0.274 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 8 10000 0.230 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 11 8880 0.316 NA NA NA NA NA NA NA NA NA NA NA NA 11 10410 0.198 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
01025 5 7 11 9020 0.305 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 7 9760 0.249 NA NA 34 8310 0.360 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 10 8400 0.353 NA NA NA NA NA NA NA NA NA NA NA NA 9 9690 0.254 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
01031 17 23 29 7930 0.389 NA NA NA NA NA NA 37 44 7660 0.410 23 32 40 8020 0.383 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 35 47 56 7310 0.437 NA NA NA NA NA 28 37 47 7850 0.396 30 41 52 7920 0.390 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 19 26 34 8290 0.361 NA NA NA 6640 0.489 NA NA NA NA NA 13 20 28 8780 0.324 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

This dataset does not have easily understable column names. The first is CODGEO, the town code. The columns starting with “TP” indicate poverty rates. the number afterwards indicate the rate at which people are considered poor. For example, with 60% which is the commonly used cutoff for poverty, the number means that a certain percentage of people earn only 60% of the median revenue in France. The column “TP60IP19” is the poverty intensity. basically how far from the median revenue are those poor people. We will also use this infomarmation. The remaining columns divide those values by age

Source: Poverty by town

2.1.2.7 Immigration by town

NIVGEO CODGEO LIBGEO AGE4 IMMI SEXE NB
COM 01001 L'Abergement-Clémenciat 00 2 1 82.647627
COM 01001 L'Abergement-Clémenciat 15 1 1 1.018551
COM 01001 L'Abergement-Clémenciat 15 2 1 39.016201
COM 01001 L'Abergement-Clémenciat 25 1 1 3.964526
COM 01001 L'Abergement-Clémenciat 25 2 1 145.677246
COM 01001 L'Abergement-Clémenciat 55 1 1 7.300113

Here we have the number of immigrants by town, age and gender. The first 3 columns are town indicators, code and name. the AGE4 variable indicate the age group. IMMI is the immigration status, 1 for immigrant, and 2 for non immigrants. SEXE is the gender, 1 for men and 2 for women and NB is the total number.

2.2 Wrangling

2.2.1 By department

2.2.1.1 List of departements.

We create a list of states with their numbers using the population dataset.

Code
List_depart <- List_depart[5:100, 1:2]
colnames(List_depart) <- c("Dep_number", "Dep_name")

To do this, we select our states and only the name and state code from our Population dataset. For reasons stated above, we only select metropolitan states. This means only selecting the first 96 states. We then rename our columns with easy to remember names that will be used as our key in the future.

2.2.1.2 Population by year and department

We will create 2 datasets.

Code
years_pop <- as.character(1975:2023) #every year we need to extract a sheet
Pop_by_year <- List_depart[1] #list with depart numbers 
Col_name <-  paste0("pop_", years_pop) #List to rename the columns


for (i in 1:length(years_pop)) {
  pop <- 
    read_excel(here::here("Raw_data/population par département.xls"), 
               sheet = years_pop[i]) 
  Pop_by_year <- cbind(Pop_by_year, pop[5:100, 8])#add them to the df
  colnames(Pop_by_year)[i+1] <- Col_name[i] #rename cols
}

our first dataset is the population by years. To create this, we create a vector of all the years present in the excel file. using a for loop, we open each excel sheet for each year, add them to our list of departments dataset created previously and rename it to a standard name. This dataset will be useful for our graphical representations.

we then, we take this dataset and only keep the population in 2019 and the department number. We will use this for our clustering analysis.

Code
Pop_2019 <- Pop_by_year |> 
  select(Dep_number, pop_2019)
View(Pop_2019)

2.2.1.3 Crime

Code
years_crime <- 1996:2021
Crime_total <- t(data.frame(years_crime))
colnames(Crime_total) <- Crime_total[1,]

view(Crime_total)

for (i in 1:nrow(List_depart)) {
  crimes_depart <- read_excel(here::here("Raw_data/crimes_depart.xlsx"), 
                              sheet = List_depart$Dep_number[i]) 
  #Load each datasheet with the department list
  
  crimes_depart <- crimes_depart |>
    select(-(1:10)) |> 
    summarise_all(sum) |>
    pivot_longer(cols = everything(), names_to = "Year", values_to = "Dep_number") |>
    mutate(Year = substr(Year, 2, 5)) |>
    group_by(Year) |>
    summarize(Dep_number = sum(Dep_number)) |> t()
  
  Crime_total <- rbind(Crime_total, crimes_depart[2,])
}

Crime_total <-  Crime_total[-1, ]
Crime_total <- cbind(List_depart[1], Crime_total)

We create a total number of crimes committed by department and by year.

Code
Crime_2019 <- Crime |>
  select(Dep_number, `2019`) |>
  rename(Crime_tot_2019 = `2019`)

Crime_2019 <- full_join(Crime_2019, Pop_2019, join_by("Dep_number"))
View(Crime_2019)

Crime_2019 <- Crime_2019 |>
  mutate(Crime_rate_1k = (Crime_tot_2019/pop_2019)*1000)

We then only select the year 2019 and compute the crime rate per thousand people. To do this we must add pur previously created population in 2019 dataframe.

2.2.1.4 Unemployment

This dataset is a bit trickier.

Code
dummy <- seq_len(nrow(Unemployement)) %% 2 #Dummy variable equal to 1 if the row is an odd number
Unemployement <- cbind(dummy, Unemployement)

Unemployement <- Unemployement |>
  filter(dummy == 1) |>
  select(-c(1, 3,4,5))
Unemployement <-  Unemployement[-(1:15),] # only select departments

Unemployement <- Unemployement |> 
  separate(Libellé, into = c("X", "Departement") ,sep=" - ") |>
  select(-1) 

Unemployement <- Unemployement[-(97:100), ]

First we need to remove every odd number row as they do not provide us with any information. We create a dummy variable that equals 1 if the row is an odd number, we then only select our rows with data. We then only keep the department name from the department name. We then only select metropolitan departments

Code
Unemployment_quarterly <- Unemployement |>
  pivot_longer(cols = -Dep_name, names_to = "Year", values_to = "Total") |>
  mutate(Total = as.numeric(Total)) |>
  mutate(Year = substr(Year, 2, 8)) |>
  group_by(Dep_name, Year) |>
  summarize(Total= mean(Total)) |>
  pivot_wider(names_from = Year, values_from = Total)

Unemployment_quarterly <- full_join(List_depart, Unemployment_quarterly, join_by(Dep_name))

We now need to compute the quarterly unemployment rate. To do this we use pivot_longer to be able to summarize our data by year and department. we then compute a mean unemployment rate for each year and rearrange our departments names to be inline with our list of departments using the full_join() function.

Code
Unemployment_year <- Unemployment_quarterly |>
  pivot_longer(cols = -(Dep_number:Dep_name), names_to = "Year", values_to = "Total") |>
  mutate(Year = substr(Year, 1, 4)) |>
  group_by(Dep_number, Dep_name, Year) |>
  summarize(Total = mean(Total)) |>
  pivot_wider(names_from = Year, values_from = Total)

We now compute the Unemployment by year by computing the mean of each quarter.

Code
Unemp <- Unemployment_year |> 
  select(Departement, `2019`) |>
  rename(Dep_name = Departement,
         Unemp_2019 = `2019`)

And we select only the year 2019 for our clustering and PCA analysis.

2.2.1.5 Middle school results

Code
Middle <- Middle_results |>
  filter(Session == 2019) |>
  select(Code.département, Admis, Inscrits) |>
  mutate(Pass_rate = (Admis/Inscrits)) |>
  group_by(Code.département) |>
  summarize(Pass_rate_2019 = mean(Pass_rate)) |>
  rename(Dep_number = Code.département) |>
  mutate(Dep_number = substr(Dep_number, 2, 3))

Middle <- Middle[(2:97), ]

Here we start by only taking the year 2019, we then only select useful columns, we then compute the Pass_rate which is the percentage of students who obtained the certificate. We then change the department number to be inline with our naming convention. Lastly, we select metropolitan departments.

2.2.1.6 Election results

Code
Election_by_dep <- Election_dep |>
  select(c(1,28)) |>
  rename(Dep_number = `Code du département`,
         Lepen_score = `% Voix/Exp...28`) |>
  mutate(Win_lepen = ifelse(Lepen_score>50, 1, 0))

Election_by_dep <- Election_by_dep[1:96, ]

#rename dep_number
for (i in 1:nrow(Election_by_dep)) {
  if(nchar(Election_by_dep$Dep_number[i]) == 1) {
    Election_by_dep$Dep_number[i] <- paste0("0",  Election_by_dep$Dep_number[i])
  }
}

For our Election results, we select the columns we need. Then, we select metropolitan departments. Then we need to change our department number to our naming convention to be able to use it as a key. As a reminder, our naming convention is to write single digit department with a leading zero.

2.2.1.7 Immigration

Code
Immig_2019 <- Immigration_2019 |>
  slice(1:96) |>
  select(c(1, 2, 4)) |>
  rename(Dep_name = `...1`,
         Immig_tot = Immigrés,
         pop_2019 = `Ensemble...4`) |>
  mutate(Immig_rate = Immig_tot/pop_2019) |>
  select(Dep_name, Immig_rate)

Immigration is quite simple, we select only metropolitan departments with the slice() function. We then select columns that we need, rename them in english and compute the immigration rate which is Total number of immigrants divided by the total population.

2.2.1.8 Population density

Code
Dens_2019 <- population_2019 |>
  rename(Dep_number = Département,
         Density_2019 = Densité) |>
  select(Dep_number, Density_2019) |>
  slice(1:96)

Similarly, population density is quite easy. we rename our selected columns to our naming convention, we then select only metropolitan departments using the slice() function.

2.2.1.9 Join them into a single dataset

Our last step is to join all of our previously cleaned datasets into a single dataset

Code
Full_data_dep <- full_join(List_depart, Unemp, join_by("Dep_name")) |>
  full_join(Crime_2019, join_by("Dep_number")) |>
  full_join(Middle, join_by("Dep_number")) |>
  full_join(Election_by_dep, join_by("Dep_number")) |>
  full_join(Immig_2019, join_by("Dep_name")) |>
  full_join(Dens_2019, join_by("Dep_number"))

2.2.2 By Town

2.2.2.1 Crime

Code
Type_per_year <- Crimes |> 
  filter(valeur.publiée == "diff") |>
  filter(!grepl("^97", CODGEO_2023)) |>
  select(CODGEO_2023, annee, classe, faits, tauxpourmille)

Type_per_year <- cbind(Type_per_year[1], Type_per_year)
colnames(Type_per_year) <- c("Departement", "Town", "Year", "Type", "Number", "Rate_per_1k")

Type_per_year <- Type_per_year |> 
  mutate(Departement = substr(Departement, 1, 2)) |>
  mutate(Rate_per_1k = str_replace_all(Rate_per_1k, ",", ".")) |>
  mutate(Rate_per_1k = as.numeric(Rate_per_1k)) |>
  group_by(Departement, Town, Year, Number, Type) |>
  summarize(Rate_per_1k = mean(Rate_per_1k, na.rm = TRUE), ) |> #omit the missing values because some are missing even though they were indicated as published
  group_by(Departement, Town, Year, Type, Rate_per_1k) |>
  summarize(Number = sum(Number))

We use this code to get the data by type and by year for each town. First, we only select values indicated as publishing their data (“diff”). We then only take metropolitan departments, by filtering every row except those starting with 97(the indicator for outer seas department). We then duplicate the town code to create a Department column. we rename the columns for more clarity. we then create our department column by substr() the first 2 digits of the town code. We now need to compute the rate for a thousand. To do this, we first must replace every comma by a point, that is because European notation of numbers uses a comma instead of a point for decimal numbers. we then use *group_by() and summarize() to get the mean rate. and use the same functions to get the total number of crimes. This dataframe will be useful for our plots

Code
Crime_2019_town <- Type_per_year |>
  filter(Year == 19) |>
  group_by(Departement, Town) |>
  summarize(Rate_per_1k = mean(Rate_per_1k),
         Total_crime = sum(Number)) |>
  rename(Town_code = Town, Dep_number = Departement)

We then summarize the rate value by town to use in our regression

Code
#Create a dataset where we look at the rate for each type of crime
Crime_type_town <- Type_per_year |>
  filter(Year == 19) |>
  rename(Town_code = Town, Dep_number = Departement) |>
  select(Dep_number, Town_code, Type, Rate_per_1k) |>
  pivot_wider(names_from = Type, values_from = Rate_per_1k) |>
  select(Dep_number, Town_code, "Cambriolages de logement", 
         "Coups et blessures volontaires", 
         "Destructions et dégradations volontaires", 
         "Usage de stupéfiants",
         "Vols sans violence contre des personnes")

Crime_type_town[is.na(Crime_type_town)] <- 0

We also create a dataset where we keep the type of crime commited. To do this, we simply do a pivot_wider() and select our types of crime. In this instance we only focus on the 5 most commited crimes. Since some cities do not have every type of crime present, we are left with a substantial amount of NA values. We decide to replace our NA’s with zeroes since an NA value indicates that there was less than 5 instances of a crime.

2.2.2.2 Density

Code
Density_town <- Dense |>
  select(c(4, 11, 15, 19)) |>
  rename(Town_name = `X4`, 
         Town_code = `X11`,
         Pop_2012 = `X15`,
         Size = `X19`) |>
  filter(!str_detect(Town_code, "^97")) |>
  mutate(Density_2019 = Pop_2012/Size) |>
  select(Town_code, Town_name, Density_2019)

To do this, we first select the columns we need. In that case, the name of the town, the Town_code, the population and the size. We then rename them to our standard, only select metropolitan departments and compute the density(Population/Size) lastly, we select only our important columns.

2.2.2.3 Population

Code
Pop_by_town <- Pop_2019 |>
  group_by(CODGEO) |>
  summarise(Total = sum(NB)) |>
  rename(Town_code = CODGEO, Total_pop = Total)
view(Pop_by_town)

The cleaning for this dataset is very simple, we compute the sum of every age group and gender group to find the total number of inhabitants by town.

2.2.2.4 Population without a diploma

Code
Pop_no_diploma <- Pop_16_no_diploma |>
  select(c(2, 3, 6, 8, 10)) |>
  slice(-1) |>
  rename(Dep_number = `Département\nen géographie courante`,
         Town_number = `Commune\nen géographie courante`,
         Town_name = `Libellé de commune`,
         no_diploma_M = `Aucun diplôme\nHommes\n25 ans ou plus\nRP2019`,
         no_diploma_F = `Aucun diplôme\nFemmes\n25 ans ou plus\nRP2019`) |>
  group_by(Dep_number, Town_number) |>
  summarize(no_diploma = sum(as.numeric(no_diploma_F), as.numeric(no_diploma_M), na.rm = TRUE)) |>
  mutate(Town_code = paste0(Dep_number, Town_number)) |>
  select(Dep_number, Town_code, no_diploma) |>
  filter(!str_detect(Dep_number, "^97")) |>
  left_join(Pop_by_town, join_by(Town_code)) |>
  na.omit() |>
  mutate(No_diploma_rate1k = (no_diploma/Total_pop)*1000)

Paris_tot <- Pop_no_diploma |>
  filter(Dep_number == "75") |>
  summarize(No_diploma_rate1k = mean(No_diploma_rate1k)) |>
  rename(Town_code = Dep_number) |>
  mutate(Town_code = "75056")

Pop_no_diploma <- Pop_no_diploma |>
  group_by(Town_code, No_diploma_rate1k) |>
  select(Town_code, No_diploma_rate1k)

Pop_no_diploma <- rbind(Pop_no_diploma, Paris_tot)

To have our final data, we first select our needed columns. we then remove the first row and rename the columns. we then compute the total number of people without a diploma and make sur to remove any NA values. we then select only metropolitan departments, with filter() and str_detect. We then compute a left join with the dataframe Pop_by_town to get the population by town. afterwards, we compute the rate at which people do not have a diploma for a thousand people by dividing the total number of people without any diploma by the population of the town and multiply it by a thousand.

Lastly, we have noticed that this dataset provides us with data from every subdivision in the city of Paris, while this level of precision could be welcome, all of our other datasets counts Paris as a single city. As such, we must aggregate our rate for the city of Paris. To do this, we create a new df called Paris_tot, we then summarize all of it. We add the row at the end of our previous dataframe. It is unordered, but will become ordered when we use the join function later on.

2.2.2.5 2017 election results.

Code
Vote_2017 <- Presidentielle_2017_Resultats_Communes_Tour_2_c |>
  select(c(1, 2, 3, 25, 32)) #select ony the values important to us in that case, the department, the towncode and the result
Vote_2017 <- Vote_2017[1:35281, ]
colnames(Vote_2017) <- c("Department", "Dep_name", "Town_num", "Macron", "Lepen")

#Need to add the Corse departement number(2A/2B)
Vote_2017$Department <- as.character(Vote_2017$Department)

for (i in 1:nrow(Vote_2017)) {
  if (Vote_2017$Dep_name[i] == "Corse-du-Sud") {
    Vote_2017$Department[i] <-  "2A"
  }
  if (Vote_2017$Dep_name[i] == "Haute-Corse") {
    Vote_2017$Department[i] <-  "2B"
  }  
}

#Now let's standardize the notation of departments and town codes by adding zeroes before.
for (i in 1:nrow(Vote_2017)) {
  if (nchar(Vote_2017$Department[i]) == 1) {
    Vote_2017$Department[i] <- paste0("0", Vote_2017$Department[i])
  }
  if (nchar(Vote_2017$Town_num[i])<=2) {
    if (nchar(Vote_2017$Town_num[i]) <= 1) { 
           Vote_2017$Town_num[i] <- paste0("00", Vote_2017$Town_num[i])
    }
           else {Vote_2017$Town_num[i] <- paste0("0", Vote_2017$Town_num[i])
           }
  }
}

#Create the town key
Vote_2017 <- Vote_2017 |> 
  mutate(Town_code = paste0(Department, Town_num)) |>
  mutate(Win_Lepen = ifelse(Lepen>50, 1, 0)) |>
  select(Department, Town_code, Macron, Lepen, Win_Lepen)#Create a dummy variable that equals 1 if LePen "won" the town (scored more than 50%)

we first select our needed columns. then, we only select metropolitan departments and rename our columns. the next step is to code the Corsica departments numbers. Corsica is split into 2 departments. 2A: “Corse-du-Sud” and 2B: “Haute-Corse”. these 2 departments are found between the department 19 and department 21. They used to be one single department butt were split in 2 in 1975. In our database, only the name can be found. we need the number. we use a for loop to code it into the Departement column. We then standardize our Departments numbers and town codes by adding leading zeroes. This then allows us to create our town_number key. Finally, we code a dummy variable that equals 1 if Marine Lepen had more than 50% of the votes in a town.

2.2.2.6 Poverty

Code
Povr_2019 <- Pov_2019 |>
  select(c(1, 4, 6)) |>
  na.omit() |>
  rename(Town_code = CODGEO,
         Povrety_2019 = TP6019,
         Intensity_povrety = TP60IP19)

Poverty is quite simple we select our columns, rename them and we do not forget to omit values that were blank

2.2.2.7 Immigration

Code
Immig_2019_town <- Immig_by_town |>
  select(CODGEO, IMMI, NB) |>
  rename(Town_code = CODGEO) |>
  filter(IMMI == 1) |>
  filter(!str_detect(Town_code, "^97")) |>
  group_by(Town_code) |>
  summarize(NB = sum(NB)) |>
  left_join(Pop_by_town, join_by(Town_code)) |>
  mutate(Immig_rate = (NB/Total_pop)*1000 ) |>
  na.omit() |>
  select(Town_code, Immig_rate)

Here we apply the same framework we have applied before. We select our needed columns, rename them. we then filter to only have the total of immigrants. then, we add the population from our previously created population dataset using left_join() we then compute the immigration rate, omit missing values.

2.2.2.8 Join them into a single dataframe

Finally, we join all of those datasets into a single one. To do this, we first use full_join() to see how values are missing.

Code
Everything_by_town <- Crime_2019_town |>
  full_join(Pop_by_town, join_by(Town_code)) |> 
  full_join(Density_town, join_by(Town_code)) |>
  full_join(Vote_2017, join_by(Town_code)) |>
  full_join(Povr_2019, join_by(Town_code)) |>
  full_join(Pop_no_diploma, join_by(Town_code)) |>
  full_join(Immig_2019_town, join_by(Town_code)) |>
  left_join(Unemp, join_by(Dep_number))

summary(Everything_by_town)

As we can see, Poverty is a smaller dataset. This may be a problem, but upon further inspection, we see that the smallest population is 1’700 inhabitants. And we still have around 5’000 observations. We previously thought about removing every observation where the number of inhabitants of a town was smaller than 200, because of the poor statistical analysis that would stem from such extreme values. the INSEE institute were most of our data has been collected recommends not doing any analysis on towns of less than 200 inhabitants. As such, the reduced number of observations with the poverty dataset is not a problem. We also add our unemployment numbers from our by_dep dataset because we were unable to find accurate unemployment numbers by department.

Code
Everything_by_town_clean <- Everything_by_town |>
  na.omit() |>
  select(Dep_number, 
         Town_code, 
         Town_name, 
         Total_pop, Rate_per_1k, Density_2019, Lepen, Win_Lepen, Povrety_2019, Intensity_povrety, No_diploma_rate1k, Immig_rate, Unemp_2019)

then, we omit our missing values, select only our very important values and we have our final dataset. that we may use for EDA and most importantly for our Regression analysis. Were we will predict the crime rate of a town using our independent variables.

We then create a dataset for each type of crime per town.

Code
Crime_per_type_town <- Crime_type_town |>
  full_join(Everything_by_town_clean, join_by(Town_code)) |>
  select(Dep_number.x, Town_code, Town_name, 
         Total_pop, Total_crime, Rate_per_1k, 
         "Cambriolages de logement", "Coups et blessures volontaires", "Destructions et dégradations volontaires", 
         "Usage de stupéfiants", "Vols sans violence contre des personnes",
         Density_2019, Lepen, Win_Lepen, 
         Povrety_2019, Intensity_povrety, No_diploma_rate1k, 
         Immig_rate, Unemp_2019) |>
  na.omit() |>
  rename(Assault = `Coups et blessures volontaires`,
         Burglary = `Cambriolages de logement`,
         Damage = `Destructions et dégradations volontaires`,
         Drugs = `Usage de stupéfiants`,
         Theft = `Vols sans violence contre des personnes`,
         Dep_number = Dep_number.x
         ) |>
  select(-Year) |>
  mutate(Lepen= Lepen/100,
         Povrety_2019 = Povrety_2019/100,
         No_diploma_rate1k = No_diploma_rate1k/1000,
         Immig_rate = Immig_rate/1000,
         Unemp_2019 = Unemp_2019/100)

2.3 Final results

We finally have these two datasets.

The Department level

Dep_number Dep_name Unemp_2019 Crime_tot_2019 pop_2019 Crime_rate_1k Pass_rate Lepen_score Win_lepen Immig_rate Density_2019
01 Ain 6.075 27831 652432 42.65732 0.8701494 39.06 0 0.1210364 113
02 Aisne 11.650 26577 531345 50.01835 0.8700065 52.91 1 0.0478296 72
03 Allier 8.900 14440 335975 42.97939 0.8657765 36.10 0 0.0518580 46
04 Alpes-de-Haute-Provence 9.775 8831 164308 53.74662 0.8419413 41.54 0 0.0810003 24
05 Hautes-Alpes 7.975 6135 141220 43.44286 0.8661628 35.88 0 0.0580088 25
06 Alpes-Maritimes 8.750 74434 1094283 68.02080 0.9132394 44.65 0 0.1482542 255

The town level

X Dep_number Town_code Town_name Total_pop Total_crime Rate_per_1k Density_2019 Lepen Win_Lepen Povrety_2019 Intensity_povrety No_diploma_rate1k Immig_rate Unemp_2019
1 01 01004 AMBERIEU-EN-BUGEY 14134 722 4.658571 562.39837 41.11 0 28 0.385 106.40581 111.72853 6.075
2 01 01007 AMBRONAY 2800 74 5.020878 70.40238 43.71 0 13 0.276 42.85714 36.07143 6.075
3 01 01014 ARBENT 3349 114 8.697631 147.97786 32.11 0 19 0.277 201.31667 230.27368 6.075
4 01 01024 ATTIGNAT 3251 35 3.080986 157.08935 41.36 0 12 0.274 46.58719 34.66931 6.075
5 01 01025 BAGE-LA-VILLE 4065 29 1.687562 76.61290 41.31 0 11 0.305 66.42066 30.01230 6.075
6 01 01031 BELLIGNAT 3652 117 6.283417 451.72414 37.29 0 29 0.389 191.67903 238.39974 6.075

3 Visualization

3.1 Crime

3.1.1 Evolution of the total crime rate

In this graphic we chose to compute the crime rate per hundred persons in the 5 departments with the biggest and the 5 with the lowest crime rate.

In this graphic we chose to compute the crime rate per hundred persons in the 5 departments with the highest crime rate and those with the lowest crime rate. The first thing we can observe is the major disparity between the average crime number in those departments. In fact the highest crime rate in the Paris is almost 5 times higher than the Aveyron 30.85 vs 145.25 It’s important to note that the departments with the higher crime rate are the one encompassing the biggest cities in France: Paris and it’s suburbs of “Seine-Saint-Denis” and “Hauts-de-Seine”, Marseille is in the “Bouches-du_Rhône”, Lyon is located in the “Rhône”. We can start to understand that density may be a key factor in determining crime rate.

Evolution of Crime in the 5 most criminal departments

Here we can observe that the total crimes in the most criminal departments are lower than what they were twenty years ago in the city of Paris (75) and Marseille (13) but are sensibly the same or higher in the peripheral areas of Paris (92/93) and in the city of Lyon (69). This can possibly be explained by the huge increase of density of population in the peripheral areas of big cities in France.

This graph represents the annual amount of crimes in the 5 least criminal departments. Here we can assume that the number of crimes in the safest departments is quite stable through the years. We also observe the small variation due to COVID-19 Pandemic

3.1.2 Types of crime commited

We start by looking at the evolution of the most common types of crime by year in France.

Here we can see that the most common crime in France is deliberate destruction and damage. The effect of the COVID-19 pandemic can also be observed as most crimes have seen an overall drop in 2020. For example, theft without violence saw a sharp drop and its levels have not gone back to 2019 levels of theft.

The Types of crime committed also depend on the location.

In these graphs, we can see the difference in the types of crime committed in the Paris department and in the Cher department. Cher is sparsely populated department. Here we observe that in Paris, theft without violence is the number one committed crime. This is surely due to Paris being a big city with many tourists and as such many pickpockets. This may not be the case in the Cher region with only 300’000 inhabitants. In the Cher, theft is less prevalent with the biggest Crime rate being deliberate destruction and damage. Followed by Burglaries.

3.2 Unemployment dataset

3.2.1 Evolution of the unemployment rate

Let’s now take a look at our first variable: Unemployment. Unemployment in France has evolved through time and has usually been higher than unemployment in Switzerland.

We can observe a huge increase in the unemployment rate in the 90’s in France. As we know after the Second World War, European countries had an important period of economic growth, this period is known as the « 30 glorieuses ». This period matched with an important demand of workers, from Europe and from other continents. There was a need to rebuild the country and an important diversification of the jobs with help of technological progress. In the early 80’s, an important recession started in Europe, with a decrease of job creation in most of the important European countries, especially Germany, Great Britain and France. The the unemployment rate was approximately increasing by 1% a year in France in 1990, to reach the record rate of 11% in 1997. This is mainly explained by the two oil shocks in the 70’s and 80’s. The French president at the time, Francois Mitterand said in 1993 : « In terms of unemployment, we’ve tried everything and we’ve failed everything ». Following this recession France had an important decrease of unemployment rate, the explanations are multiple but the most plausible is the creation of the Euro zone and currency. This tendency lasted until the major financial crisis of 2008. In the last decade the unemployment rate has been mastered once again until the COVID 19 pandemic in 2020 which created a massive suppression of jobs in various areas such as the show industry or the food service sector.

3.2.2 Unemployment by department

We can observe with this graph that there’s an important disparity in unemployment rates in France. It’s also relevant that one of the departments with the higher crime rate which we’ve seen before, is also one of the departments with the higher crime rates (93). This may be an indication of what we could possibly find with the statistical regressions.

3.2.3 Maps

This map compute the crime rate in every French department and shows us the important disparity in the amount of crimes committed across the country.

This map shows us the score in the second round of the French presidential election of 2017, which we choose to compute as a possible explanation variable. As we know Emmanuel Macron has been elected president with 66.10% so the departments overall voted for him, however it is interesting to see the departments where Marine LePen had an important share of the votes. Here those departments are represented with a darker color.

here this map shows the school pass rate for each department at the end of middle school. Unfortunately, no clear trend seems to emerge from the map. Futhermore, the difference between the highest value (0.9305) and the lowest value (0.8274) is only of 10%. This means that all other values are contained within only 10%. Showing them on a map with colors tends to over-exagerate the differences between values. As such, this map has poor vizualization power and may even be considered somewhat misleading. It is still left here in the intermediary report, but may be removed from the final report.

Lastly, this map shows us th immigration rate, be department. As we can see, Immigrants are mostly located in the vicinity of the Paris region. Other notable areas of higher immigration are border regions in the south and in the east.

3.2.3.1 Interactive Maps

We create a map using leaflet. Here it should be noted that for clarity reasons, the values of Immigration, Density and Crime rate have all been capped. This is because the parisian region is an outlier for these variables. As such, the true value of the crime rate is not accuratly represented for these variables. This is a tradeoff between showing the differences of between french departments and representing the Paris region criminality.

In the above map, one can see the crime rate for many towns in France. To display them, simply click on the desired department. This was done to avoid confusing overlap, especially in the Paris region. It shoudl be noted that not all towns are represented here.

3.3 Clustering

We will now run a cluster analysis of each department, we want to see if some departments have simmilarities.

First, we compute the correlation matrix.

Unemp_2019 Crime_rate_1k Pass_rate Lepen_score Immig_rate Density_2019
Unemp_2019 1.0000000 0.2554038 -0.4734385 0.5346616 0.1253533 -0.0838700
Crime_rate_1k 0.2554038 1.0000000 -0.2234582 -0.2514767 0.6862923 0.7659192
Pass_rate -0.4734385 -0.2234582 1.0000000 -0.3175815 -0.2521074 0.0489446
Lepen_score 0.5346616 -0.2514767 -0.3175815 1.0000000 -0.2983429 -0.4813112
Immig_rate 0.1253533 0.6862923 -0.2521074 -0.2983429 1.0000000 0.5752951
Density_2019 -0.0838700 0.7659192 0.0489446 -0.4813112 0.5752951 1.0000000

As we can see, Density is highly correlated with Crime Rates. In contrast, Density and unemployment have a very low correlation. Other interesting Correlations are the one with Immigration and Crime rates, Unemployment and the Score of Marine Lepen.

3.3.1 Hierarchical clustering

We now compute the distance matrix. Using this distance matrix, we run a hierarchical clustering analysis using the average distance method.

we get the following dendrogram.

This graph tells us that observation 76 stands out from the rest, Observation 76 is actually Paris. Paris is a clear outlier. Based on this graph we could try to use 3 clusters.

We can now visualize this using our 2 most correlated variables, Crime rates and Density. ::: {.cell} ::: {.cell-output-display} ::: :::

This is quite interesting, our 3 clusters are Paris, the Paris region and the rest of France. To further show this, we can see this in a map format. ::: {.cell} ::: {.cell-output-display} ::: :::

3.3.2 K-means clustering

To further our analysis and chose a number of clusters, we will run K-means analysis with n-start = 25. Our data will also be scaled to reduce differences. This approach is what is recommended, we will run the same clustering 25 times with different centers as we initialize. Running it 25 times will ensure that we get the best clusters we can.

We get the followig scree plot from our analysis: ::: {.cell} ::: {.cell-output-display} ::: ::: as we can tell, our optimal number of clusters seems to be 6 clusters.

We’ll visualize this using a cluster plot and a facet wrap of Lepen Scores and Unemployment rate, we previously say that the correlation was decently high. ::: {.cell} ::: {.cell-output-display} :::

:::

As before, we will map it for more clarity. ::: {.cell} ::: {.cell-output-display} ::: :::

Overall, during this clustering exercise, our main learning is that Paris and it’s suburbs are very big compared to the rest of France, so much that it tends skew the analysis. For our final report, it could be interesting to remove Paris and the prarisian region to be able to conduct an analysis of the rest of our departments

4 Regression

4.1 Correlations

To see how our data may be correlated, we will start by running a correlation analysis.

Total_pop Total_crime Rate_per_1k Burglary Assault Damage Drugs Theft Density_2019 Lepen Win_Lepen Povrety_2019 Intensity_povrety No_diploma_rate1k Immig_rate Unemp_2019
Total_pop 1.0000000 0.9671113 0.0745876 0.0761699 0.1671192 0.1657608 0.1280443 0.1044380 0.3750675 -0.1401256 -0.0469309 0.1411815 0.1991688 0.0168425 0.2094886 -0.0000037
Total_crime 0.9671113 1.0000000 0.0857828 0.0513501 0.1185074 0.1208010 0.0886172 0.1092204 0.2876953 -0.0904293 -0.0270226 0.0913764 0.1291260 0.0084762 0.1343885 0.0038679
Rate_per_1k 0.0745876 0.0857828 1.0000000 0.2993310 0.5862414 0.6278417 0.3172191 0.8537541 0.0827408 0.0550478 0.0477199 0.1556111 0.2207339 0.1034802 0.1837904 0.1307496
Burglary 0.0761699 0.0513501 0.2993310 1.0000000 0.1046776 0.1233708 0.0360979 0.0260185 0.1682410 -0.0796135 -0.0183377 -0.0556103 0.1855126 -0.1341978 0.2590153 0.1295899
Assault 0.1671192 0.1185074 0.5862414 0.1046776 1.0000000 0.7038243 0.4589958 0.5342329 0.2456963 0.1082044 0.1122877 0.5665724 0.5354627 0.4130961 0.3635960 0.1737369
Damage 0.1657608 0.1208010 0.6278417 0.1233708 0.7038243 1.0000000 0.4271578 0.4671829 0.2212841 0.0609374 0.0754702 0.5140205 0.5119915 0.3451733 0.3094494 0.1791224
Drugs 0.1280443 0.0886172 0.3172191 0.0360979 0.4589958 0.4271578 1.0000000 0.2415101 0.2018375 -0.0637514 -0.0206959 0.3143225 0.3059773 0.2027961 0.3055331 0.0224611
Theft 0.1044380 0.1092204 0.8537541 0.0260185 0.5342329 0.4671829 0.2415101 1.0000000 0.0934956 -0.0298697 -0.0165724 0.0719966 0.0975024 0.0372630 0.1145932 0.0265083
Density_2019 0.3750675 0.2876953 0.0827408 0.1682410 0.2456963 0.2212841 0.2018375 0.0934956 1.0000000 -0.2946410 -0.0785837 0.2032937 0.3043823 0.0420265 0.4991462 -0.0054624
Lepen -0.1401256 -0.0904293 0.0550478 -0.0796135 0.1082044 0.0609374 -0.0637514 -0.0298697 -0.2946410 1.0000000 0.6723312 0.3046409 0.0610666 0.4016264 -0.1804670 0.5326813
Win_Lepen -0.0469309 -0.0270226 0.0477199 -0.0183377 0.1122877 0.0754702 -0.0206959 -0.0165724 -0.0785837 0.6723312 1.0000000 0.2571925 0.1031373 0.2200172 -0.1018818 0.4410174
Povrety_2019 0.1411815 0.0913764 0.1556111 -0.0556103 0.5665724 0.5140205 0.3143225 0.0719966 0.2032937 0.3046409 0.2571925 1.0000000 0.6794585 0.7382237 0.4190565 0.3586537
Intensity_povrety 0.1991688 0.1291260 0.2207339 0.1855126 0.5354627 0.5119915 0.3059773 0.0975024 0.3043823 0.0610666 0.1031373 0.6794585 1.0000000 0.4264645 0.5340969 0.2414833
No_diploma_rate1k 0.0168425 0.0084762 0.1034802 -0.1341978 0.4130961 0.3451733 0.2027961 0.0372630 0.0420265 0.4016264 0.2200172 0.7382237 0.4264645 1.0000000 0.3072764 0.2141137
Immig_rate 0.2094886 0.1343885 0.1837904 0.2590153 0.3635960 0.3094494 0.3055331 0.1145932 0.4991462 -0.1804670 -0.1018818 0.4190565 0.5340969 0.3072764 1.0000000 -0.0214125
Unemp_2019 -0.0000037 0.0038679 0.1307496 0.1295899 0.1737369 0.1791224 0.0224611 0.0265083 -0.0054624 0.5326813 0.4410174 0.3586537 0.2414833 0.2141137 -0.0214125 1.0000000

We first decide to compute the correlations with all of our variables dividing « crimes » in every crime categories to see if there are interesting correlation links to explain them. As we can see Burglaries are highly correlated with Immigration rate and Unemployment rate and negatively correlated to the No diploma rate. On the other side Assault and Battery is mainly correlated to poverty rate and No Diploma rate. Intentional Damage behave the same as Assault and Battery, it is also interesting to observe that voting for Marin Lepen is positively correlated with these two variables. For Drug use we observe that the two more correlated variables are Immigration and Unemployment rate. And finally we observe that Theft without violence only has small correlations with all of our explicative variables.

4.2 Linear regression

We started by scaling all variables, expressing them in percentages except for density (people/km^2) and Win_Lepen (a binary variable ranging from 0 to 1). The initial regression included the average of all crimes and potential explanatory variables. Subsequently, we conducted a stepwise regression AIC test to identify indispensable variables, leading to the removal of Win_Lepen, Intensity Poverty, Poverty, and No Diploma Rate.

4.2.1 All crimes

The initial regression includes the average of all crimes and potential explanatory variables. With a formula as seen below:

\[ RatePer1k = \beta_0 + \beta_1 Density + \beta_2WinLepen + \beta_3Poverty + \beta_4NoDiplomaRate + \beta_5ImmigrationRate + \beta_5UnemploymentRate + \beta_6IntensityPoverty + \beta_6 Total Pop \]

Subsequently, we conducted a stepwise regression AIC test to identify indispensable variables, leading to the removal of Win_Lepen, Intensity Poverty, Poverty, and No Diploma Rate.

Code
step(Assault_model_full, scope = list(lower= Assault_model_null, upper=Assault_model_full),
     direction = "backward")

Start: AIC=7497.75 Assault ~ Density_2019 + Povrety_2019 + No_diploma_rate1k + Immig_rate + Unemp_2019 + Win_Lepen + Intensity_povrety

                Df Sum of Sq   RSS    AIC
  • Win_Lepen 1 1.90 22479 7496.2 22478 7497.7
  • Unemp_2019 1 9.67 22487 7497.9
  • Immig_rate 1 24.45 22502 7501.1
  • No_diploma_rate1k 1 32.69 22510 7502.9
  • Density_2019 1 179.52 22657 7535.0
  • Intensity_povrety 1 963.09 23441 7702.7
  • Povrety_2019 1 1102.14 23580 7731.9

Step: AIC=7496.16 Assault ~ Density_2019 + Povrety_2019 + No_diploma_rate1k + Immig_rate + Unemp_2019 + Intensity_povrety

                Df Sum of Sq   RSS    AIC
  • Unemp_2019 1 7.85 22487 7495.9 22479 7496.2
  • Immig_rate 1 23.11 22502 7499.2
  • No_diploma_rate1k 1 33.99 22513 7501.6
  • Density_2019 1 178.80 22658 7533.3
  • Intensity_povrety 1 961.36 23441 7700.8
  • Povrety_2019 1 1118.95 23598 7733.8

Step: AIC=7495.89 Assault ~ Density_2019 + Povrety_2019 + No_diploma_rate1k + Immig_rate + Intensity_povrety

                Df Sum of Sq   RSS    AIC

22487 7495.9 - Immig_rate 1 29.57 22517 7500.4 - No_diploma_rate1k 1 36.55 22524 7501.9 - Density_2019 1 179.43 22667 7533.1 - Intensity_povrety 1 954.00 23441 7698.9 - Povrety_2019 1 1151.78 23639 7740.3

Call: lm(formula = Assault ~ Density_2019 + Povrety_2019 + No_diploma_rate1k + Immig_rate + Intensity_povrety, data = Crime_type)

Coefficients: (Intercept) Density_2019 Povrety_2019 No_diploma_rate1k
-2.5713593 0.0001265 10.0408601 3.3073259
Immig_rate Intensity_povrety
1.6952493 8.2349352

We find this reduced model. compared to our full model

Comparison of the full model and our reduced model
Dependent variable:
Rate_per_1k
Full model Reduced model
(1) (2)
Density_2019 -0.00005* -0.00005*
(0.00003) (0.00003)
Win_Lepen 0.055
(0.122)
Povrety_2019 -1.736**
(0.823)
No_diploma_rate1k 0.893
(1.469)
Immig_rate 5.925*** 8.769***
(0.854) (0.732)
Unemp_2019 18.243*** 22.539***
(2.721) (2.319)
Intensity_povrety 5.532***
(0.714)
Total_pop 0.00000** 0.00000***
(0.00000) (0.00000)
Constant 0.299 1.380***
(0.268) (0.201)
Observations 4,934 4,934
R2 0.066 0.054
Adjusted R2 0.065 0.053
Residual Std. Error 2.665 (df = 4925) 2.682 (df = 4929)
F Statistic 43.702*** (df = 8; 4925) 70.095*** (df = 4; 4929)
Note: p<0.1; p<0.05; p<0.01

VIF analysis for our reduced model

Density_2019 Immig_rate Unemp_2019 Total_pop
vif(Reduce_model) 1.483003 1.333468 1.000503 1.16461

The remaining significant variables, including Immigration, Unemployment, TotalPop, and Density, passed the Variance Inflation Factor test, ensuring no issues of multicollinearity (VIF score < 5). However, we noted a low Adjusted R-Squared, indicating that our model explained only a small proportion of the data. To address this, we performed separate regressions for the five most prevalent crime categories in France: Assault and Battery, Burglaries, Wilful Damage and Destructions, Drug Use, and Theft without Violence.

It’s crucial to emphasize that the stepwise regression with AIC mentioned earlier was applied to each of our regressions. The regressions presented below have been refined to exclude irrelevant variables based on this test.

4.2.2 Assault and Battery

We run a stepwise regression with Asssault and Battery as the dependent variable

Reduced Assault model with stepwise regression
Dependent variable:
Assault
Density_2019 0.0001***
(0.00002)
Povrety_2019 10.041***
(0.632)
No_diploma_rate1k 3.307***
(1.169)
Immig_rate 1.695**
(0.666)
Intensity_povrety 8.235***
(0.570)
Constant -2.571***
(0.154)
Observations 4,934
R2 0.373
Adjusted R2 0.372
Residual Std. Error 2.136 (df = 4928)
F Statistic 586.264*** (df = 5; 4928)
Note: p<0.1; p<0.05; p<0.01

VIF analysis

Density_2019 Povrety_2019 No_diploma_rate1k Immig_rate Intensity_povrety
vif(Assault_model_final) 1.380414 3.445171 2.328073 1.741231 2.222481

In the Assault and Battery regression, Density, Poverty, No_Diploma_Rate, and Intensity_Poverty emerged as significant variables, with Intensity_Poverty having the most substantial impact (beta of 8.235). The VIF test confirmed the absence of multicollinearity, and the Adjusted R-Squared improved to around 0.37.

Our final formula is the following: \[ AssaultandBattery = \beta_0 + \beta_1 Density + \beta_2Poverty + \beta_3NoDiplomaRate + \beta_4ImmigrationRate + \beta_5IntensityPoverty \] ### Burglaries

We run the regression with Burglaries and find this reduced model

Reduced Burglary model with stepwise regression
Dependent variable:
Burglary
Povrety_2019 -19.624***
(1.425)
No_diploma_rate1k -17.833***
(2.515)
Immig_rate 26.937***
(1.340)
Unemp_2019 71.162***
(4.430)
Intensity_povrety 16.391***
(1.243)
Constant -2.353***
(0.451)
Observations 4,934
R2 0.194
Adjusted R2 0.193
Residual Std. Error 4.657 (df = 4928)
F Statistic 237.508*** (df = 5; 4928)
Note: p<0.1; p<0.05; p<0.01

VIF analysis

Povrety_2019 No_diploma_rate1k Immig_rate Unemp_2019 Intensity_povrety
vif(Burglary_model_final) 3.685396 2.268626 1.483072 1.21055 2.228921

For the Burglaries regression, all variables remained highly significant. Notably, Poverty and No_Diploma_Rate negatively affected predictions (-19.624/-17.833), while Unemployment had the most significant positive impact (71.162), followed by Immigration (26.937) and Intensity_Poverty (16.391). The VIF test assured us of no multicollinearity issues, though the Adjusted R-Squared was approximately 0.19.

Our final formula is the following: \[ Burglary = \beta_0 + \beta_1Poverty + \beta_2NoDiplomaRate + \beta_3ImmigrationRate + \beta_4UnemploymentRate + \beta_5IntensityPoverty \]

4.2.3 Intentional damage and destructions

We run the model with Damage and Destruction

Reduced Damage model with stepwise regression
Dependent variable:
Damage
Povrety_2019 15.533***
(0.806)
Win_Lepen -0.362**
(0.150)
Intensity_povrety 15.102***
(0.901)
Density_2019 0.0002***
(0.00003)
Constant -0.256
(0.234)
Observations 4,934
R2 0.319
Adjusted R2 0.319
Residual Std. Error 3.586 (df = 4929)
F Statistic 577.476*** (df = 4; 4929)
Note: p<0.1; p<0.05; p<0.01

VIF analysis

Povrety_2019 Win_Lepen Intensity_povrety Density_2019
vif(Damage_model_final) 1.989598 1.097345 1.972806 1.117902

In the Intentional Damage and Destruction regression, Poverty, Intensity_Poverty, Density, and somewhat less so Win_Lepen were highly significant. Both Poverty and Intensity had a substantial positive impact (~15), while the impact of Win_Lepen was negligible. The VIF test once again confirmed no multicollinearity.

Our final formula is the following: \[ DamageDestruction = \beta_0 + \beta_1Poverty + \beta_2WinLepen + \beta_3IntensityPoverty + \beta_4 Density \]

4.2.4 Drug Use

We run the model with Drug use as our dependent variable

Reduced Drug use model with stepwise regression
Dependent variable:
Drugs
Density_2019 0.0001***
(0.00002)
Povrety_2019 7.769***
(0.781)
No_diploma_rate1k -2.810**
(1.393)
Immig_rate 6.178***
(0.811)
Unemp_2019 -9.551***
(2.586)
Win_Lepen -0.319***
(0.116)
Intensity_povrety 2.754***
(0.678)
Constant -0.240
(0.254)
Observations 4,934
R2 0.148
Adjusted R2 0.147
Residual Std. Error 2.534 (df = 4926)
F Statistic 122.378*** (df = 7; 4926)
Note: p<0.1; p<0.05; p<0.01

VIF analysis

Density_2019 Povrety_2019 No_diploma_rate1k Immig_rate Unemp_2019 Win_Lepen Intensity_povrety
vif(Drug_model_final) 1.381153 3.73974 2.351868 1.837049 1.393577 1.310905 2.234937

For the Drug Use regression, the AIC stepwise regression test retained all variables except TotalPop. Poverty and Immigration Rate had the most significant positive impacts (7.769/6.178), while Unemployment had a considerable negative impact (-9.551). The VIF test indicated no multicollinearity issues.

Our final formula is the following: \[ DrugUse = \beta_0 + \beta_1 Density + \beta_2Poverty + \beta_3NoDiplomaRate + \beta_4ImmigrationRate + \beta_5UnemploymentRate + \beta_6WinLepen + \beta_7IntensityPoverty \]

4.2.5 Theft without violence

We run the model with Theft as our independent variable

Reduced Theft model with stepwise regression
Dependent variable:
Theft
Immig_rate 20.744***
(5.793)
Win_Lepen -1.198
(0.828)
Density_2019 0.0005***
(0.0002)
Unemp_2019 31.454*
(18.390)
Intensity_povrety 10.221**
(4.116)
Constant -1.736
(1.681)
Observations 4,934
R2 0.017
Adjusted R2 0.016
Residual Std. Error 18.498 (df = 4928)
F Statistic 17.320*** (df = 5; 4928)
Note: p<0.1; p<0.05; p<0.01

VIF analysis

Immig_rate Win_Lepen Density_2019 Unemp_2019 Intensity_povrety
vif(Theft_model_final) 1.756987 1.262294 1.337899 1.322221 1.547796

In the Theft without Violence regression, Immigration Rate and Density were the only highly significant variables. The positive impact of Immigration_Rate was substantial (~20), but the low Adjusted R-Squared suggests challenges in interpreting results. Nevertheless, the VIF test confirmed no multicollinearity issues in this regression.”

Our final formula is the following: \[ Theft = \beta_0 + \beta_1 Density + \beta_2WinLepen + \beta_3ImmigrationRate + \beta_4UnemploymentRate + \beta_5IntensityPoverty \]

4.3 Discussing our results

In our initial research, we wanted to understand if Unemployment could predict Crime in France. When we analyzed all types of crimes together, we found a connection between Unemployment and Crime. However, our findings lacked precision due to a low Adjusted R-Squared value. Upon further investigation, looking at each crime type separately, we found some interesting results. Unemployment’s influence on predicting Crime appeared to change depending on the specific crime category. For instance, in cases like Assault and Battery and Intentional Damage and Destruction, Unemployment didn’t seem to significantly impact the calculations. This was counterintuitive for our expectations. Moreover, concerning Burglaries and Theft without Violence, Unemployment emerged as a more critical factor. It became the primary predictor for the occurrence of these crime types.

We then delved into the potential impact of Poverty on predicting Crime in France. We examined two datasets: one reflecting the number of people affected by poverty in each town (‘Poverty_2019’) and another showing how far individuals were to the poverty threshold (‘Intensity_Poverty’). Again, we analyzed each crime category separately for accuracy. In certain crimes, such as Assault and Battery and Intentional Damage and Destruction, poverty and its intensity significantly influenced crime rates. We then faced a very surprising result when computing the regression with the Burglaries, in this case, Poverty seems to have an important negative impact while the Intensity of Poverty has a postive impact. The VIF factor was also under 5 meaning that multicollinearity was not the reason for that result. This leads us to the interpretation that when talking about Burglaries, the amount of poor people in a town would have a negative impact since Richer neighbourhoods are more likely to be burglarized. In the case of Intensity of Poverty, this varibale is not an indicator of how many people are poor, but rather how poor they are we could assume that the poorest people who burglarize. Regarding Drug Use crimes, Poverty emerged as a significant predictor, but with notably lesser impact compared to previous regressions. Similarly, when examining ‘Theft without Violence,’ while ‘Intensity_Poverty’ showed promise, it didn’t reach statistical significance. In essence, while Poverty and its intensity play substantial roles in predicting criminal behavior in France, their impact varies based on the specific crime type, often leading to conflicting trends.

Moving on to the impact of immigration on crime rates in France, our analyses revealed nuanced outcomes. For instance, in ‘Assault and Battery’ and ‘Intentional Damage and Destruction’ cases, the Immigration variable appeared insignificant, with minimal influence. However, in Burglaries and Drug-related crimes, Immigration seemed to exert a significant effect. Nonetheless, its impact on ‘Theft without Violence’ remained uncertain due to a low Adjusted R-Squared value, cautioning against definitive conclusions. Overall, while Immigration played a significant role in some models, its impact varied considerably among different crime types.

We also explored the role of Population Density in predicting criminal activity. Interestingly, Density didn’t seem to affect Burglaries much, possibly because criminals find it easier to operate in less crowded, rural areas, For example, in the department of Gironde, where Bordeaux is located, the city with the most burglaries. Concerning the remaining crime types, Density is always significant, but has a small coefficient. We observe that some of the most criminal cities in France, such as Marseille or Lyon are not very dense. This explains why Density may not be the best factor to explain Crime.

In terms of Education level, the data suggests that the population without a diploma doesn’t always have the highest crime rates, which might be unexpected. For instance, in the case of Assault and Battery, our findings indicate that crime rates do relate to education level, but the impact is minimal. In many areas across France, especially in the northeastern mining regions, there’s a large number of individuals with diplomas, yet the crime rates there is remarkably low. However, in La Courneuve (93) a city in the Paris region with a notably high crime rate, there is a considerable proportion of people without a diploma. Furthermore, the absence of a diploma seems to have a negative correlation with Burglary rates, with a significant factor of -17. This isn’t surprising as it is a wealthier region, implying that cities with a higher education level might be the primary targets for burglaries. Regarding Drug Use, it was unexpected to see that the absence of a diploma negatively impacts the regression. This could be explained by acknowledging that drug use is widespread among the population in France, affecting even the more educated regions.

Lastly, we explored whether there was a connection between voting patterns in the 2017 presidential election and the crime rates. We aimed to understand if the way people voted affected the crimes committed in those areas and if political behavior correlated with observed criminality. While the AIC eliminated our Voting variables in most of our regressions, an interesting negative relationship emerged between Marine Le Pen “winning” the city and both Drug Use and Intentional Damage and Destruction regressions. For instance, if a city voted in majority for Marine Le Pen in 2017, it had on average an Intentional Damage and Destruction rate 0.362 lower than a city that voted against her. That decrease was 0.319. in the Drug Use category.. However, apart from these instances, the Le Pen variables weren’t significant enough to draw conclusive remarks for our other variables.

4.4 X. RoadMap

What is left to do for our report?

As things stand, we still need to run our regression and analyse its results. We also need to answer our research questions.

If possible we would also like to run a PCA analysis on the different types of crimes to see if certain states experience a type of crime more often than others, and if some crimes can be related.

We would also like to tweak various parts of our report, for example, we still need to add more graphs and at least one interactive map using the leaflet package.

We would also like to develop our EDA section by adding more information on our raw databases, with information regarding the missing values or the distribution of values.